对大脑的电子显微镜(EM)体积的精确分割对于表征细胞或细胞器水平的神经元结构至关重要。尽管有监督的深度学习方法在过去几年中导致了该方向的重大突破,但它们通常需要大量的带注释的数据才能接受培训,并且在类似的实验和成像条件下获得的其他数据上的表现不佳。这是一个称为域适应的问题,因为从样本分布(或源域)中学到的模型难以维持其对从不同分布或目标域提取的样品的性能。在这项工作中,我们解决了基于深度学习的域适应性的复杂案例,以跨不同组织和物种的EM数据集进行线粒体分割。我们提出了三种无监督的域适应策略,以根据(1)两个域之间的最新样式转移来改善目标域中的线粒体分割; (2)使用未标记的源和目标图像预先培训模型的自我监督学习,然后仅用源标签进行微调; (3)具有标记和未标记图像的端到端训练的多任务神经网络体系结构。此外,我们提出了基于在源域中仅获得的形态学先验的新训练停止标准。我们使用三个公开可用的EM数据集进行了所有可能的跨数据库实验。我们评估了目标数据集预测的线粒体语义标签的拟议策略。此处介绍的方法优于基线方法,并与最新的状态相比。在没有验证标签的情况下,监视我们提出的基于形态的度量是停止训练过程并在平均最佳模型中选择的直观有效的方法。
translated by 谷歌翻译
从显微镜图像体积分段3D细胞核对于生物学和临床分析至关重要,从而实现了细胞表达模式和细胞谱系的研究。然而,神经元核的当前数据集通常包含小于$ 10 ^ {\ text {-} 3} \ mm ^ 3 $的卷,每卷少于500美元,无法揭示大脑区域的复杂性并限制神经元的调查结构。在本文中,我们推动了向子立方毫米秤的任务向前推进了,并用两个完全注释的卷策划了NUCMM数据集:1美元\ mm ^ $电子显微镜(EM)含有几乎整个斑马鱼大脑,大约170,000左右核;还有0.25美元\ mm ^ 3 $ micro-ct(uct)卷,其中鼠标视觉皮层的一部分,大约7,000个核。具有两种成像模态,体积大小和实例数量显着增加,我们在外观和密度中发现了神经元核的大量多样性,对该领域引入了新的挑战。我们还进行统计分析以定量地说明这些挑战。为了解决挑战,我们提出了一种新颖的混合表示学习模型,该模型结合了前景掩模,轮廓图和签名距离变换来生产高质量的3D面罩。 NUCMM数据集上的基准比较表明,我们所提出的方法显着优于最先进的核细胞分割方法。代码和数据可在https://connectomics-bazaar.github.io/proj/nucmm/index.html中获得。
translated by 谷歌翻译
Spacecraft pose estimation is a key task to enable space missions in which two spacecrafts must navigate around each other. Current state-of-the-art algorithms for pose estimation employ data-driven techniques. However, there is an absence of real training data for spacecraft imaged in space conditions due to the costs and difficulties associated with the space environment. This has motivated the introduction of 3D data simulators, solving the issue of data availability but introducing a large gap between the training (source) and test (target) domains. We explore a method that incorporates 3D structure into the spacecraft pose estimation pipeline to provide robustness to intensity domain shift and we present an algorithm for unsupervised domain adaptation with robust pseudo-labelling. Our solution has ranked second in the two categories of the 2021 Pose Estimation Challenge organised by the European Space Agency and the Stanford University, achieving the lowest average error over the two categories.
translated by 谷歌翻译
Petrov-Galerkin formulations with optimal test functions allow for the stabilization of finite element simulations. In particular, given a discrete trial space, the optimal test space induces a numerical scheme delivering the best approximation in terms of a problem-dependent energy norm. This ideal approach has two shortcomings: first, we need to explicitly know the set of optimal test functions; and second, the optimal test functions may have large supports inducing expensive dense linear systems. Nevertheless, parametric families of PDEs are an example where it is worth investing some (offline) computational effort to obtain stabilized linear systems that can be solved efficiently, for a given set of parameters, in an online stage. Therefore, as a remedy for the first shortcoming, we explicitly compute (offline) a function mapping any PDE-parameter, to the matrix of coefficients of optimal test functions (in a basis expansion) associated with that PDE-parameter. Next, as a remedy for the second shortcoming, we use the low-rank approximation to hierarchically compress the (non-square) matrix of coefficients of optimal test functions. In order to accelerate this process, we train a neural network to learn a critical bottleneck of the compression algorithm (for a given set of PDE-parameters). When solving online the resulting (compressed) Petrov-Galerkin formulation, we employ a GMRES iterative solver with inexpensive matrix-vector multiplications thanks to the low-rank features of the compressed matrix. We perform experiments showing that the full online procedure as fast as the original (unstable) Galerkin approach. In other words, we get the stabilization with hierarchical matrices and neural networks practically for free. We illustrate our findings by means of 2D Eriksson-Johnson and Hemholtz model problems.
translated by 谷歌翻译
To alleviate the problem of structured databases' limited coverage, recent task-oriented dialogue systems incorporate external unstructured knowledge to guide the generation of system responses. However, these usually use word or sentence level similarities to detect the relevant knowledge context, which only partially capture the topical level relevance. In this paper, we examine how to better integrate topical information in knowledge grounded task-oriented dialogue and propose ``Topic-Aware Response Generation'' (TARG), an end-to-end response generation model. TARG incorporates multiple topic-aware attention mechanisms to derive the importance weighting scheme over dialogue utterances and external knowledge sources towards a better understanding of the dialogue history. Experimental results indicate that TARG achieves state-of-the-art performance in knowledge selection and response generation, outperforming previous state-of-the-art by 3.2, 3.6, and 4.2 points in EM, F1 and BLEU-4 respectively on Doc2Dial, and performing comparably with previous work on DSTC9; both being knowledge-grounded task-oriented dialogue datasets.
translated by 谷歌翻译
Video provides us with the spatio-temporal consistency needed for visual learning. Recent approaches have utilized this signal to learn correspondence estimation from close-by frame pairs. However, by only relying on close-by frame pairs, those approaches miss out on the richer long-range consistency between distant overlapping frames. To address this, we propose a self-supervised approach for correspondence estimation that learns from multiview consistency in short RGB-D video sequences. Our approach combines pairwise correspondence estimation and registration with a novel SE(3) transformation synchronization algorithm. Our key insight is that self-supervised multiview registration allows us to obtain correspondences over longer time frames; increasing both the diversity and difficulty of sampled pairs. We evaluate our approach on indoor scenes for correspondence estimation and RGB-D pointcloud registration and find that we perform on-par with supervised approaches.
translated by 谷歌翻译
In this work we propose a novel token-based training strategy that improves Transformer-Transducer (T-T) based speaker change detection (SCD) performance. The conventional T-T based SCD model loss optimizes all output tokens equally. Due to the sparsity of the speaker changes in the training data, the conventional T-T based SCD model loss leads to sub-optimal detection accuracy. To mitigate this issue, we use a customized edit-distance algorithm to estimate the token-level SCD false accept (FA) and false reject (FR) rates during training and optimize model parameters to minimize a weighted combination of the FA and FR, focusing the model on accurately predicting speaker changes. We also propose a set of evaluation metrics that align better with commercial use cases. Experiments on a group of challenging real-world datasets show that the proposed training method can significantly improve the overall performance of the SCD model with the same number of parameters.
translated by 谷歌翻译
Autonomous underwater vehicles (AUVs) are becoming standard tools for underwater exploration and seabed mapping in both scientific and industrial applications \cite{graham2022rapid, stenius2022system}. Their capacity to dive untethered allows them to reach areas inaccessible to surface vessels and to collect data more closely to the seafloor, regardless of the water depth. However, their navigation autonomy remains bounded by the accuracy of their dead reckoning (DR) estimate of their global position, severely limited in the absence of a priori maps of the area and GPS signal. Global localization systems equivalent to the later exists for the underwater domain, such as LBL or USBL. However they involve expensive external infrastructure and their reliability decreases with the distance to the AUV, making them unsuitable for deep sea surveys.
translated by 谷歌翻译
In this work, we estimate the depth in which domestic waste are located in space from a mobile robot in outdoor scenarios. As we are doing this calculus on a broad range of space (0.3 - 6.0 m), we use RGB-D camera and LiDAR fusion. With this aim and range, we compare several methods such as average, nearest, median and center point, applied to those which are inside a reduced or non-reduced Bounding Box (BB). These BB are obtained from segmentation and detection methods which are representative of these techniques like Yolact, SOLO, You Only Look Once (YOLO)v5, YOLOv6 and YOLOv7. Results shown that, applying a detection method with the average technique and a reduction of BB of 40%, returns the same output as segmenting the object and applying the average method. Indeed, the detection method is faster and lighter in comparison with the segmentation one. The committed median error in the conducted experiments was 0.0298 ${\pm}$ 0.0544 m.
translated by 谷歌翻译
Obtaining photorealistic reconstructions of objects from sparse views is inherently ambiguous and can only be achieved by learning suitable reconstruction priors. Earlier works on sparse rigid object reconstruction successfully learned such priors from large datasets such as CO3D. In this paper, we extend this approach to dynamic objects. We use cats and dogs as a representative example and introduce Common Pets in 3D (CoP3D), a collection of crowd-sourced videos showing around 4,200 distinct pets. CoP3D is one of the first large-scale datasets for benchmarking non-rigid 3D reconstruction "in the wild". We also propose Tracker-NeRF, a method for learning 4D reconstruction from our dataset. At test time, given a small number of video frames of an unseen object, Tracker-NeRF predicts the trajectories of its 3D points and generates new views, interpolating viewpoint and time. Results on CoP3D reveal significantly better non-rigid new-view synthesis performance than existing baselines.
translated by 谷歌翻译